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Abstract 

This paper describes the principles underlying an efficient implementation of a 
lazy functional language, compiling to code for ordinary computers. It is based 
on combinator-like graph reduction: the user defined functions are used as rewrite 
rules in the graph. Each function is compiled into an instruction sequence for an 
abstract graph reduction machine, called the G-machine, the code reduces a function 
application graph to its value. The G-machine instructions are then translated into 
target code. Speed improvements by almost two orders of magnitude over previous 
lazy evaluators have been measured; we provide some performance figures. 
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1 Background 



Functional programming is emerging as an alternative to the conventional imperative style 
of programming [Lan66], [Bac78]. Lazy evaluation (call by need, normal order evaluation) 
has been proposed as a method for executing functional programs, the advantages being, 
among others, that unbound data structures, e.g. infinite lists, can easily be handled, and 
further that it makes interactive input/output possible in functional programs [Fri76]. 
Though functional programming languages have many pleasing properties, an obstacle to 
their wider use has been the lack of efficient implementations. 

Our work is based on Turner's combinator approach [Tur79], where programs are trans- 
formed into expressions containing the combinators S, K, I etc from combinatory logic, 
thus removing all variables from the program. A combinator expression is evaluated in the 
'SKI-machine' using normal order graph reduction. A problem with combinators is that 
each combinator defines rather a small interpretative step, and combinator expressions 
have a tendency to become very cumbersome for non-trivial programs. 

Our lazy evaluation method is similar to the combinator reduction regime, but in- 
stead of using a standard, fixed set of combinators, each user defined function is used as 
a 'combinator', i.e., a rewrite rule for the graph. Functions are compiled into code se- 
quences for an abstract graph reduction machine, called the G-machine } with instructions 
that explicitly construct and manipulate expression graphs to reduce expressions to their 
values; both shared and cyclic graphs can be directly constructed. Target code gener- 
ation for ordinary computers from the G-machine code is rather straight-forward. One 
might say that the compiler constructs a specialised, machine-language coded combinator 
interpreter from each program. 



In our graph reduction approach a program is an expression whose value will appear as, 
in general, a stream of basic values (integers, booleans etc) on the output file. Expres- 
sions are evaluated using normal order graph reduction, and is carried out by performing 
transformations on the graph to reduce it to its canonical form. A canonical form is an 
expression which cannot be further reduced on the outermost level (even though subex- 
pressions may be further reducible). In this paper canonical forms are integer and boolean 
constants, list expressions ei.e 2 with arbitrary expressions e\ and e 2 , and function appli- 
cations / e\ ■ ■ ■ e m where / is a function that takes more than m curried arguments; a 
reduction of an application can take place only if all curried arguments to the functions are 
present. Thus in general for an expression to become completely reduced, subexpressions 
must also be reduced to canonical form, for instance the elements of a list. Evaluation of 
a function application amounts to using the corresponding function definition as a graph 
rewrite rule, repeatedly rewriting the application graph to an instance of the right hand 
side of the function definition, with arguments substituted for formal parameters, until 
having reached a canonical form. 

For illustration, consider the following functional program, its value being the infinite 
list of natural numbers. 



2 Graph reduction 
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output: 0 



from 0 



from @ 
succ 0 
(b) 



from @ 
succ 0 



0 1 



from @ 
succ @ 
succ 0 

(d) 



from @ 
succ 1 



from @ 
succ 1 



(f) 



Figure 1: Graph reduction of from 0. The output is shown to the left of each graph. 

succ is the predefined successor function, '.' is the infix list construction operator and 
juxtaposition denotes function application. Graph reduction of this program is shown in 
figure 1, In the figures function application is denoted by @. 

The start expression 1(a) is transformed to 1(b) using the rewrite rule for the function 
from as defined above, with a pointer to the integer 0 substituted for the parameter n. In 
1(b) the expression is on canonical form, and so is also its head part 0. The head value 
can now be output and dropped from the graph, 1(c). Again the the rewrite rule for from 
is applied to the graph, 1(d), and is now on the form e.e', which is canonical. The next 
step is to reduce the head part succ 0 to its canonical form using the rewrite rule for succ, 
1(e). The resulting integer 1 is then written on output and dropped from the graph. 

The execution continues in this way ad infinitum. Note that the shared expression 
succ 0 has been replaced in 1(e) by its value. In general expression graphs are evaluated, 
i.e., reduced to their canonical forms, at most once and all expressions that share a 
particular subexpression benefit from the evaluation (call by need). 



3 An introductory example of G-machine execution 

In our graph reduction scheme each function definition is compiled into a sequence of 
G-machine instructions. Each graph rewrite, according to a function definition, is carried 
out by executing the code for that function. We here illustrate execution of the G-machine 
with the reduction step (c)-(d) from figure 1. The G-machine state transitions are shown 
in figure 2. 

Before the start of the reduction a pointer to the expression graph is at the top of a 
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pointer stack, figure 2(a) (the stack grows downwards). Reduction is started by execution 
of the G-machine instruction EVAL, in this case by the print mechanism. EVAL causes 
a new stack frame to be created with the previously topmost pointer as its single entry, 
saving the old stack on another stack called the dump (not shown in figure 2), then an 
unwind state pushes pointers to the application nodes of the left 'backbone' until having 
reached a function node, (a) - (c). The stack is then rearranged so that the topmost 
pointer of the stack points to the argument of from, the second pointer from the top is 
left untouched and will thus point to the apply node which is to be updated by the code 
with the result of the application. The G-machine now starts to execute the code for 
from, which is (see section 4.2 and table 3 how we obtain this) 

from: PUSH 0; PUSHFUN from; 

PUSHFUN succ; PUSH 3; MKAP; 
MKAP; CONS; UPDATE 2; RET 1. 

Except for the last two instructions, this instruction sequence is essentially a postfix 
representation of the right hand side of from. The PUSH m instruction pushes the rath 
pointer of the stack relative to the top and starting with 0; note that different offsets 
have to be used to push pointers to the formal parameter ra, depending on the current 
depth of the stack (the reason for this is explained in section 5.5). The PUSHFUN succ 
instruction pushes a pointer to a succ function node. MKAP constructs an application 
node with the to topmost as subparts; similarly for CONS. After having constructed the 
graph for the right hand side of the definition of from, figure 2(k), the cons node is copied 
onto the result apply node by the UPDATE 2 instruction, having thereby transformed 
from(succ 0) to (succ 0) .from(succ(succ 0)) in the graph, which is a canonical expression. 
The RET 1 instruction pops one element from the stack, and since the top graph is now 
on canonical form, the old stack is restored from the dump and control is returned to the 
instruction following EVAL. In general, had the top graph been not on canonical form but 
an application node or a function node, instead of restoring the old stack and returning 
the G-machine would have reentered the unwind state to continue the reduction of the 
new expression graph. 

4 Short-circuiting graph reduction 

We have previously indicated that we do graph reduction by repeatedly rewriting the 
graph to the right hand sides of functions. Indeed we can use G-machine code that does 
precisely this; in most cases, however, we can take considerable shortcuts an do away with 
many intermediate graph rewritings. 

Consider the function definition succ n = n + 1. If we compile it into code that 
constructs the graph for the right hand side, add n 1, then when executed the expression 
graph succe will be rewritten into add el, thus leaving over the task of further reduction to 
add, which will reduce the expression to its integer value. Much efficiency can be gained 
if we compile succ into code that first reduces its parameter n, computes the value of 
n + 1, and then remakes the apply node to a integer node with this value. This avoids 
the construction of the intermediate graph add el. A code sequence for the function succ 
is accordingly 
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□ 



from @ 
succ 0 



□ 



EVAL 



from @ 
succ 0 

(b) 



from @ 
succ 0 



unwm 



ind 



from @ 
succ 0 



rearrange 



(d) 



from @ 
succ 0 



PUSH 0 



from @ 

succ 0 
from 



PUSHFUN from 
(f) 



from @ 

succ 0 

from 

succ 
PUSHFUN succ 



from @ 

succ 0 
from 



succ 
PUSH 3 



(h) 



from @ 
succ 0 

/rom @ 
succ 



MKAP 



i i 

□ 



from @ 
succ 0 

@ 

from @ 
succ 



MKAP 



0) 



from @ 
succ @ 
succ 0 



from @ 
succ 0 



CONS 



from @ 
succ 

(k) 



from @ 
succ @ 
succ 0 



UPDATE 2 



(1) 



RET 1 



m 



Figure 2: G-machine reduction of from (succ 0). 
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succ 




succ 




succ 



unwind 



PUSH 0 



EVAL 



V s 



V s 



V s 



succ 



succ 



succ 



GET 



PUSHBASIC 1 



ADD 



succ 



MKINT 



UPDATE 2 



RET 1 



Figure 3: Shortcut evaluation of function succ. 

succ: PUSH 0; EVAL; GET; PUSHBASIC 1; ADD; 
MKINT; UPDATE 2; RET 1. 

The execution of this code sequence is shown in figure 3. The addition is done on a separate 
stack for basic values, called V, with instructions MKINT and GET for transfering values 
to and from the graph. PUSHBASIC pushes a basic value constant on the V stack. 

Similar reasoning can be applied to all other predefined primitive functions; if the right 
hand side is an if-expression, for example, then the code would do the following: compute 
the value of the condition, and if true the proper apply node is to be updated with the 
value of the then-expression, else updated with the value of the else-expression. 

This line of reasoning is systematized in the next section by having different compila- 
tion schemes, one giving code that computes the value of an expression, and one giving 
code that constructs the graph of an expression. This more direct method is significantly 
faster; in our compiler implementation we have measured a speedup of about a factor of 
ten for some typical programs, compared to naive graph reduction. 



5 Technical details of the abstract machine and com- 
piler 

In this section we give a complete set of compilation rules for a simple functional language, 
compiling to G-machine code. We also give an abstract description of the G-machine, 
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describing the the effects of the G-machine instructions on a machine state. 

5.1 Source Language 

A program in the language described here consists of a set of recursive functions and an 
expression whose value is the value of the program, as summarised in table 1. Normal 
order evaluation is assumed. Each function fi takes n 8 - curried arguments and the free 
variables of e 4 - are in the set {x\ ■ ■ ■ x ni }. Operators +, — etc are viewed as syntactic sugar 
for applications to predefined functions add, sub etc, of which we deal with the ones given 
in table 2. 

Table 1: Syntax of programs 

program ::= fiX\ ■ ■ ■ x ni = e\ (function definitions) 

fm^l ' ' ' X Urn &m 

e 0 (the value of the program) 

e ::= identifiers | constants | e e 

| let x\ = e\ and • • • and x m = e m in e 

(multiple simultaneous local definitions) 
| letrec x± = e\ and • • • and x m = e m in e 

(multiple simultaneous local recursive definitions) 



Table 2: Predefined functions 

add sub mul div (binary arithmetic operators) 

neg (unary negation) 

It le eq ne ge gt (binary relational operators) 

and or (conditional and, or) 

not (logical negation) 

cons (binary list construction) 

hd tl (unary head and tail of a list) 

null (unary test on empty list) 

if (ternary if-then-else) 



Note that there is no lambda expression in the syntax of expressions, functions are 
defined only globally. Functional programs with local function definitions and lambda 
expressions with free variables can be transformed into the form above, using super com- 
binators [Hug82]; an algorithm to the same effect is used in our compiler implementation, 
however, the program resulting from our transformation does not exhibit 'full laziness', 
as is the main issue in Hughes' work. 
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5.2 Compilation rules 

The abstract compiler given in table 3 is subdivided into 4 compilation schemes: 

J-'lf X\ ■ ■ ■ x m = e] gives the code for a function which reduces the graph of an applica- 
tion to canonical form. 

C[e] r n gives code that constructs the graph of e and leaves a pointer to the result on 
the top of the stack. 

£\e\ r n gives code that computes the value, i.e. canonical form, of e and leaves a pointer 
to the value on the top of the stack. It yields the sames result as C[e] r n followed 
by an EVAL instruction and embodies the short-circuiting described in section 4. 

B\e\ r n computes the basic value of e and leaves the result on the basic value stack 
V, yielding the same result as £\e\ r n followed by a GET instruction. The idea 
behind B is to avoid construction of a new node for each intermediate result in an 
arithmetic or logical expression. The value is transferred to the graph only when 
the entire expression has been evaluated, by the MKINT or MKBOOL instruction. 



C - 7 



Table 3: Compilation rules 



Hf x i ■■■x m =e}= £{e\ r (m + 1); UPDATE (m + 1); RET m, 
where r = [x\ = m + 1, #2 = m, • • • , « m = 2] 



Scheme £: Evaluate 



1. 


£ [i] r n 


= PUSHINT i 


2. 


£[&] rn 


= PUSHBOOL b 


3. 


£[m7] r n 


= PUSHNIL 


4. 


£[a;] r n 


= PUSH (n - r(x)); EVAL 


5. 


a/1" 


= PUSHFUN / 


6. 


£[add ei e 2 J r n 


= B\add t\ e 2 J r n; MKINT, and similarly for sub, mul, div 


7. 


£\neg e]rn 


= Blnegej r n; MKINT 


8. 


£\eq ei e 2 ] r n 


= B\eq t\ e 2 J r n; MKBOOL, and similarly for It, gt, ne, ge, le 


9. 


£[noi e]rn 


= B{note\ rn; MKBOOL 


10. 


£[and ei r n 


= £\if t\ e 2 falsej r n 


11. 


£\or t\ e 2 J r n 


= £\if t\ true e 2 J r n 


12. 


£\cons t\ e 2 J r n 


= C[ei] rn; C[e 2 ] r (n + 1); CONS 


13. 


£\null e]rn 


= s\e} rn; NULL; MKBOOL 


14. 


£\hd e] rn 


= £[e] rn; HD; EVAL, similarly for tl 


15. 


£{if ei e 2 e 3 ] rn 


= fi[ei] r n; JFALSE h; £{e 2 } rn; JMP / 2 ; LABEL h; £[e 3 ] rn; LABEL l 2 






where l\ and /2 are new unique labels 


16. 


£ [let dine] r n 


= C/ei[d] r n; £[e] r' n'; SLIDE (n' — n), where (r', n') =^f r[d] r n 


17. 


£[letrec d in e] r n = C/eirec[d] r' n'; £\e\ r' n' ; SLIDE (n' — n), where (r', n') =Xr\d~\ r n 


18. 


£[e] r n 


= C[e] rn; EVAL otherwise 


Scheme £>: Evaluate basic value 


1. 


£>[i] r n 


= PUSHBASIC i 


2. 


Bib} r n 


= PUSHBASIC b 


3. 


B\add t\ e 2 J r n 


= £>[ei] r n; £>[e2] r (n + 1); ADD, similarly for sub, mul, div, eg, ne, /£, gi, ge, /e. 


4. 


B\neg e] rn 


= B\e\ rn; N EG 


5. 


B\not e]rn 


= B{e\ rn; NOT 


6. 


B\null e]rn 


= £{e\rn; NULL 


7. 


B{if ei e 2 e 3 ] r n 


= B[ei] r n; FALSE h; B[e 2 ] rn; JMP i 2 ; LABEL i i; B[e 3 ] rn; LABEL l 2 






where l\ and / 2 are new unique labels 


8. 


£>[let dine] rn 


= C/ei[d] r n; #[e] r' n' ; POP (n' - n) where (r', n') =- ; fr[d] r n 


9. 


£>[letrec dine] rn 


= C/eirec[d] r' n'; £>[e] r' n'; POP (n' — n) where (r', n') = ^rfd] r n 


10. 


£>[e] r n 


= £[e] rn; GET, otherwise 


Scheme C: Construct g 


;raph 


1. 


C[i] r n 


= PUSHINT i 


2. 


C[6] r n 


= PUSHBOOL b 


3. 


C[m7] r n 


= PUSHNIL 


4. 


C[/l r n 


= PUSHFUN / 


5. 


C\x\ r n 


= PUSH (n - r(x)) 


6. 


C\cons t\ e 2 J r n 


= C[ei] rn; C[e 2 ] r (n + 1); CONS 


7. 


C{ei e 2 ] r n 


= C[ei] r n; C[e 2 ] r (n + 1); MKAP, if not matched above 


8. 


C[let dine] rn 


= C/ei[d] r n; C[e] r' n'; SLIDE (n' — n) where (r', n') =X r[d] r n 


9. 


C[letrec d in e] rn 


= Cletrec\d~\ r' n'; C[e] r' n'; SLIDE (n' — n) where (r', n') =^f r[d] r n 



Miscellaneous schemes for local definitions 



Xrfvi = ei and • • • i> 8 - = e 8 • • • and i> m = e m ] r n= (r[i> 8 - = n + 1, • • • i> 8 - = n + i, • • • v m = n + m], n + m) 
C/ei[i>i = ei and • • • ^ = e 8 - • • -and i; m = e m ] r n= CfeiJ r n;- ■ ■ C[e 8 ] r (n + i - 1);- • -C[e m ] r (n + m - 1) 
Cletrec\v\ = t\ and • • • = e 8 - • • • and i; m = e m ] r n= ALLOC m; C[ei] r (n + m); UPDATE m; - • • 
C[e 8 ] r (n + m); UPDATE (m + 1 - i);- • • C[e m ] r (n + m); UPDATE 1 
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In addition, there are 3 help-functions used for local definitions: Xr returns a pair of 
the extended environment and the new stack depth, Clet and Cletrec gives code to extend 
the stack with local definitions. In the translation schemes r is a mapping from identifiers 
of parameters to their location on the stack, and n is the current depth of the stack. 
Below we show compilation of the function f x = x.f x . 

J-'lf x = cons x (/ x)J = 

£\cons x (f x)j [x = 2] 2; UPDATE 2; RET 1 = 

C[x] [x = 2] 2; C[f x] [x = 2] 3; CONS; UPDATE 2; RET 1 = 

PUSH 0; C[f] [x=2] 3; C\x\ [x = 2] 4; MKAP; CONS; UPDATE 2; RET 1 = 

PUSH 0; PUSHFUN f; PUSH 2; MKAP; CONS; UPDATE 2; RET 1. 

5.3 The abstract machine 

A state in the abstract G-machine is a 7-tuple (0, C, S, V, G, E } D) where 

O is the output produced so far, as shown in the example in figure 1. It consists of 
a sequence of integers and booleans. In an actual implementation O is printed on 
standard output. 

C is the G-code sequence currently being executed. 

S is a stack of node names, i.e., pointers into the graph. 

V is a stack of basic values, i.e., integers and booleans on which the arithmetic and 

logical operations are performed, as shown in section 4. 
G is the graph: a mapping from node names to nodes. We have nodes of the following 

types: 

INT i integer nodes, 

BOOL b boolean nodes, 
NIL empty list nodes, 

CONS rii n 2 list nodes, where n\ is a pointer to the head graph and n 2 is a pointer 
to the tail graph, 

AP rii n 2 application nodes, where n\ is a pointer to the function graph and n 2 

is a pointer to the argument graph, 
FUN / a node with a reference to the compiled function /, 

HOLE a node which is to be filled in with another value later; it is used while 

constructing cyclic graphs for letrec expressions. 

E is a global environment, which is a mapping from function names to pairs consisting 
of the number of curried arguments of the function, and its code sequence. E cor- 
responds to the code segment in conventional machines and is constant throughout 
the execution of the program. 

D is a dump used for recursive calls to EVAL: a stack of pairs consisting of 

• a stack of node names: S before EVAL, 

• a G-code sequence: C before EVAL. 

Table 4 summarises the state transition rules for the G-machine instructions used in 
the compilation rules given in table 3. 
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Table 4: State transition rules for G-machine instructions 



(o;i, c, s, v, G[n - 
(o;b, c, s, v, G[n 



INT i], E, D) 
: BOOL b], E, D) 



G[n = CONS ni n 2 ], E, D) 
=> (o, c, s, v, G[n = NIL], E, D) 



o, PRINT.c, n.s, v, G[n = INT i], E, D) 
o, PRINT.c, n.s, v, G[n = BOOL b], E, D) 
o, PRINT.c, n.s, v, G[n = CONS n x n 2 ], E, D) 

(o, EVAL.PRINT.EVAL. PRINT.c, ni .n 2 .s, v 
o, PRINT.c, n.s, v, G[n = NIL], E, D) 
o, EVAL.c, n.s, v, G[n = AP m n 2 ], E, D) => 

(o, UNWIND. (), n.(), v, G[n = AP m n 2 ], E, (c,s).D) 
o, EVAL.c, n.s, v, G[n = INT i], E, D) => (o, c, n.s, v, G[n 

similarly for nodes BOOL b, NIL, CONS n x n 2 and FUN /. 
o, UNWIND. (), n.s, v, G[n = AP m n 2 ], E, D) => (o, UNWIND. (), ni.n.s, G[n 
o, UNWIND. (), n 0 . ni ■ ■ ■ n k .s, v, G[n 0 = FUN /, 

m = AP n[ <,• ■ ■ n k = AP n' k <'], E[f = (k,c)}, D) => 

(o, c, < • • • n' k \n k .s, v, G[n 0 = FUN /, m = AP n[ n'/, ■ ■ ■ n k = AP n' k n' k '], E[f 
UNWIND. (), n 0 . ni ■ ■ ■ n k .(), v, G[n 0 = FUN /], E[f = (a,c')], {c',s').D) and k < a => 

(o, c', n k J, v, G[n 0 = FUN /], E[f = (k,c<)}, D) 
RET m.c, v, m ■ ■ ■ n m .n.(), G[n = INT i], E, (c',s').D) => 

(o, c', n.s', v, G[n = INT i], E, D), similarly for nodes BOOL b, NIL and CONS n\ n 2 . 
RET m.c, n\.- ■ ■ n m .n.s, v, G[n = AP n\ n 2 ], E, D) =>• 

(o, UNWIND. (), n.s, v, G[n = AP m n 2 ], E, D), similarly for n = FUN /. 



INT i], E, D), 

AP m n 2 ], E, D) 

(k,c')}, D) 



c, n .s, v, 

c, n'.s, v, 

c, n'.s, v, 

c, n'.s, v, 

c, n m .n 0 . 

c, n'.s, v, 

c, n'.s, v, 

c, n'.s, v, 

c, n'.s, v. 



PUSHINT i.c, s, v, G, E, D) 
PUSHBOOL b.c, s, v, G, E, D) 
PUSHNIL.c, s, v, G, E, D) 
PUSHFUN f.c, s, v, G, E, D) 
PUSH m.c, no.- • -.n m .s, v, G, E, D) 
MKINT.c, s, i.v, G, E, D) 
MKBOOL.c, s, b.v, G, E, D) 
MKAP.c, m.n 2 .s, G, E, D) 
CONS.c, ni .n 2 .s, G, E, D) 
ALLOC m.c, s, v, G, E, D) 

(o, c, n[ ■ ■ -n' m .s, v, G[n[ = HOLE, • 
UPDATE m.c, no.- • • n m .s, v, G[no = No 

(o, c, n m .s, v, G[n 0 = N 0 , n m 

SLIDE m.c, no.- • -.n m .s, v, G, E, D) 
GET.c, n.s, v, G[n = INT i], E, D) 
GET.c, n.s, v, G[n = BOOL b], E, D) 
PUSHBASIC i.c, s, v, G, E, D) 
ADD.c, s, i 2 .ii.y, G, E, D) 

similarly for SUB, MUL, DIV, EQ, NE, LT, GT, LE and GE 

the last six putting boolean values on V. 



G[n' = INT i], E, D) 
G[n' = BOOL b], E, D) 
G[n' = NIL], E, D) 
G[n' = FUN /], E, D) 
■ ■ ■ n m .s, v, G, E, D) 
G[n' = INT i], E, D) 
G[n' = BOOL b], E, D) 
G[n' = AP n 2 ni], E, D) 
G[n' = CONS n 2 m], E, D) 



n' m = HOLE], E, D) 
n m = N m ], E, D) => 
: N 0 ], E, D) 

(o, c, n 0 .s, v, G, E, D) 
=> (o, c, s, i.v, G[n = INT i], E, D) 
=> (o, c, s, b.v, G[n = BOOL b], E, D) 

(o, c, s, i.v, G, E, D) 
=> (o, c, s, (ii + i 2 ).v, G, E, D), 



28. 


(°> 


NEG.c, s, i.v, G, E, D) 


=^ 


(o, 


c, 


s, (-i).v, G, E, D) 


29. 


(°> 


NOT.c, s, b.v, G, E, D) 


=> 


(o, 


c, 


s, H>).v, G, E, D) 


30. 


(°> 


JFALSE l.c, s, true.v, G, E, D) 


=> 


(o, 


c, 


s, G, E, D) 


31. 


(°> 


JFALSE l.c, s, false.v, G, E, D) 


=> 


(o, 


JMP l.c, s, v, G, E, D) 


32. 


(°> 


JMP 1.- • • LABEL l.c, s, v, G, E, D) 


=> 


(o, 


c, 


s, v, G, E, D) 


33. 


(°> 


LABEL l.c, s, v, G, E, D) 


=> 


(o, 


c, 


s, v, G, E, D) 


34. 


(°> 


HD.c, n.s, v, G[n = CONS n x n 2 ], E, D) 


=> 


(o, 


c, 


ni.s, v, G[n = CONS n\ n 2 ], E 






similarly for TL 










35. 


(°> 


NULL.c, n.s, v, G[n = CONS n x n 2 ], E, D) 


=> 


(o, 


c, 


s, false.v, G[n = CONS n\ n 2 ], 


36. 


(°> 


NULL.c, n.s, v, G[n = NIL], E, D) 


=> 


(o, 


c, 


s, true.v, G[n = NIL], E, D) 


37. 


(°> 


POP m.c, n\.- ■ -.n m .s, v, G, E, D) 


=> 


(o, 


c, 


s, v, G, E, D) 
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<(),Co,(),(),{},£o,()> 

where c 0 = £[e 0 ] r 0 0; PRINT 
and E 0 = { f 0 ■ K f[/oa;i 



fm 

add 
sub 



(2, JF[ p x y = add it/] ), 
[2, T\pxy = sub x yj ), 



Figure 4: Initial state of the G-machine. 

In a G-machine state, () denotes an empty stack or an empty code sequence. The 
semicolon appends values onto an output sequence. Period is used as infix cons for 
instruction sequences and push for stacks. Updating of the graph is written as e.g. G[n = 
INTz]. If there is a node named n previously in G, then the node n is updated with a new 
value, otherwise a new node is created. This notation is also used in pattern matching 
situations, for instance state transition rule 1 is applicable if the top of the stack points to 
an integer node. For instructions with parameters, e.g. PUSH m.c binds as (PUSH m).c. 
A node name that occurs only in the right hand side of a transition rule is considered to 
be new and unique, e.g. n' in transition rule 12. G-machine states that do not match any 
rule are considered to be run time errors. 

The definition of the G-machine has certain similarities with the definition of the 
SECD machine [Lan64], new in our model is that we describe how we do lazy output, and 
handle updating and sharing in a graph, in the framework of the abstract machine. 

5.4 Initial and final state of the machine 

The initial configuration of the machine for a given program is shown in figure 4. The 
machine starts with an empty output, a code sequence c 0 for evaluating and printing the 
start expression e 0 , an empty pointer stack and an empty basic value stack, an empty 
graph, an environment E 0 containing the compiled code for the functions together with 
their arity, and an empty dump. Since the operators +, — etc are represented with 
applications to predefined functions add, sub etc in unevaluated expression graphs, the 
code for these functions must also be present in E 0 . The machine stops when the state 
(o, (), (), (), G, E, ()} has been reached. 

5.5 The evaluation mechanism 

The evaluation of the program is driven by PRINT, which in case of a list starts the eval- 
uation of the head and the tail part of the list, see transition rules 1-4. Only the leaves of 
the printed data structure appears on the output, for instance the list (2. 3. nil). (5. nil). nil 
gives the output sequence 2 3 5. 

The EVAL instruction reduces the graph pointed to by the pointer at the top of the 
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Figure 5: Rearrangement of the stack after unwind. 



stack to canonical form. If the top of stack is an apply node, transition rule 5, the rest 
of the code sequence and the stack except for the top element is pushed onto the dump, 
and the unwind state is entered, following the function parts of apply nodes, pushing the 
function pointers on the way (transition rule 7). When a function node has been reached, 
and the stack is deep enough to contain all curried arguments to the function, rule 8, the 
stack is arranged according to figure 5. The top m elements of the stack now points to 
the m curried arguments of the function, and below them there is a pointer to the apply 
node which is to be updated with the value of the application. The reason for remaking 
the stack in this manner is firstly to make the function arguments easily accessible, and 
secondly to access function arguments and local variables introduced by let and letrec 
expressions uniformly. 

After the stack rearrangement the function code is executed; see also compilation rule 
T . If there were too few curried arguments in the application then a premature return is 
performed, rule 9. 

The RET instruction performs a return from EVAL if the function code has updated 
the apply node for the return value with an integer, boolean, nil or cons node, rule 10. If 
the updated node is an apply node or function node then the UNWIND state is reentered, 
to continue the reduction of the new graph; an example when this happens is shown in 
figure 6 which illustrates reduction of the expression / (g.nil) 3 , where f I = hdl and 
g x = 2 X x. The value of / [g.nil) is the function g, and / has one 'extra' argument 
supplied. After EVAL and two unwind transitions we have the configuration shown in 
6(b), the top of the stack is then made to point ot the argument of /, figure 6(c). The 
code for / then computes the value of hdl, which is the function g, and updates the apply 
node of the application / : (g.nil) with the function node g, figure 6(d). Since the entire 
graph for which EVAL was called for is not yet fully reduced, the RET 1 instruction of the 
code for / makes the machine reenter the unwind state, figure 6(e), and the top of of the 
stack is made to point to the argument of g, figure 6(f). The code for g then computes the 
value of 2 + x and updates the top apply node with the integer 6. The RET 1 instruction 
of the function g finally performs a proper return from EVAL, figure 6(h). 

The fact that 'extra' curried arguments can be applied to function in this manner, 
and in general we cannot know in advance how many extra, is the reason for accessing 
parameters and variables relative to the top of the stack (instead of relative to the bottom 
which perhaps at first sight would seem more natural). 
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fl = hdl 

g x = 2 x x 



f: PUSH 0; EVAL; HD; EVAL; UPDATE 2; RET 1. 
g: PUSHBASIC 1; PUSH 0; EVAL; 

GET; MUL; MKINT; UPDATE 2; RET 1 



i i 

□ 



/ 



9 nil 



EVAL 



9 nil 
(b) 



9 nil 

UPDATE 2 



9 nil 



RET 1 



9 3 

UPDATE 2 

(f) 



RET 1 



(d) 

i i 

□ 6 

(h) 



Figure 6: Graph reduction when a function returns a function. 

5.6 Let and letrec expressions 

The code for a let or letrec expression constructs the graphs for the locally defined 
expressions and puts pointers to these graphs onto the stack. When leaving the code for 
the let or letrec expression these stack elements are removed by the SLIDE instruction; see 
compilation rules E16, E17 etc. The recursive local definitions in letrec expressions are 
implemented by constructing cyclic graphs, see scheme C letrec in table 3 As an example 
consider the code sequence 

C [letrec x = f x in x xj r n= 

Cletrec\ x = f x ] r[x = n + 1] (n + 1); C\x x\ r[x = n + 1] (n + 1); SLIDE 1 = 
ALLOC 1; PUSHFUN f; PUSH 1; MKAP; UPDATE 1; 
PUSH 0; PUSH 1; MKAP; SLIDE 1. 

Figure 7 shows some of the intermediate machine states when executing this code se- 
quence. To construct the graph for x we must have a pointer to x, for this purpose a 
HOLE node is allocated by the ALLOC 1 instruction; when fx has been constructed the 
HOLE node is updated with this graph. 



6 Further improvements of the G-machine code 

This section discusses two kinds of improvements of the G-machine code, which is not 
embodied in the compiler given in the previous section: improved tail recursive behaviour 
and exploiting the knowledge that a variable has been previously evaluated. Both kinds 
of improvements are included in our compiler implementation. 
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SLIDE 1 
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Figure 7: Construction of a cyclic graph. 

6.1 Tail recursion 

Graph reduction by succesive rewritings to right hand sides gives us a loop-like behavior for 
tail recursive calls. However, this desirable property is not preserved by the compilation 
scheme given in table 3, because compilation scheme T emits code for computing the 
value of the right hand side, before updating with the result. Thus using scheme £ in T 
is advantageous if the right hand side is an application to a primitive predefined function 
such as add, sub etc, but does not bring out the proper tail recursive behaviour if the right 
hand side is an application to a user defined function. For instance, using the compilation 
rules in table 3, we have 

Flgx = f5} = £lf5}[x = 2]2; UPDATE 2; RET 1 = 
PUSHFUN f; PUSHINT 5; MKAP; EVAL; UPDATE 2; RET 1. 

Here the EVAL instruction is unnecessary, and in fact harmful, in that it will create 
another stack frame for the evaluation of /5. If the EVAL instruction is removed from the 
code above the UPDATE instruction will update with the apply node of /5, and the RET 
instruction will make the machine reenter the unwind state; no additional stack frame is 
created. 

Proper tail recursive behaviour can be reinstated into our compilation schemes by 
introducing yet another compilation scheme, 1Z for return value, which preserves the 
context that the result is to be returned as the value of the current function evaluation. 
Starting with the compilation function JF, we then have 

T\ fxi ■ ■ ■ x m = e ] = H\e\ [xi = m + 1, • • • x m = 2] (m + 1) 

where the code emitted by 1Z also performs the updating and returning. To return the 
value of an application to a user defined function we can do a simplistic graph rewite, by 

ft[ fei ■ ■ ■ e m ] r n= C{ fe t ■ ■ ■ e m ] r n; UPDATE n; RET (n - 1). 

1Z can also be made to propagate down the branches of an if expression, by 

ft[if e l e 2 e 3 ] r n= BJeJ r n; J FALSE k; "K[e 2 ] r n; LABEL k; "K[e 3 ] r n 
and down into the in-expression in let and letrec expressions, by 
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Figure 8: Rearranging the stack for tail calls. 

ftjlet d in e] r n = Clet{d} r n; ft[e] r' n' 
7\L[letrec o? in e] r n = Cletrec\d\ r' n'; lZ\e\ r' n' 

where (r',n') = AYjc/] r n. 

The default case for 1Z is 

TZlej r n = S{ej r n; UPDATE n; RET (n - 1). 

To return the value of the application fe\ • • • e m , we can do even better by shortcircuiting 
the unwind action which in this case follows the RET instruction. Provided the arity of 
/ is m, which is a condition for that the same apply node will be updated both by the 
calling function and /, we can use 

ft-lf e 1 ■ ■ ■ e m ] r n= <S[ e 1 ■ ■ ■ e m ] r n; JFUN /. 

The new scheme S emits a code sequence to rearrange the stack in the manner shown in 
figure 8, and then a direct jump is performed to the first instruction of /, thus turning 
tail recursion into loops in the G-machine code. Using this method on our little example 
above, assuming / only takes one argument, we would get 

T\gx = /5 ] = PUSHINT 5; MOVE 1; JFUN f. 

The new instructions MOVE and JFUN are defined by 

(o, MOVE ra.c, n 0 ■ ■ ■ n m _i.n m .s, v, G, E } D) (o, c, n\ ■ ■ ■ n m _i.n 0 .s, v, G, E } D) 
(o, JFUN f.c, 5 , (), G, E[f=(a } c% D) ^ (o, c', G, E[f = (a } c% D). 



6.2 On evaluated variables 

The first time EVAL is executed for a particular variable, that graph is reduced to canon- 
ical form, and subsequent EVALs on the same variable has no effect. By keeping count of 
when variables are being evaluated in each function we can avoid emitting EVAL instruc- 
tions more than once for each variable. For example, to compute the basic value of the 
expression x X x, table 3 gives us the code sequence 

B\mul x x\ [x = 1] 1 = 

Blxj [x = l]l; Blx}[x = 1] 1; MUL = 

PUSH 0; EVAL; GET; PUSH 0; EVAL; GET; MUL. 
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Here the second EVAL instruction is clearly useless and can be eliminated. Apart from 
having removed a useless EVAL instruction, conditions also become better for target code 
generation from the G-code, since we get longer code sequences unbroken by calls to EVAL 
and may thus keep things in machine registers a bit longer. 

Because of the cost involved in construction and reduction of expression graphs, it 
is cheaper to evaluate some expressions directly than to construct their graphs, even if 
the value is not going to be used. This is the case for expressions involving constants, 
variables which has been evaluated previously, and arithmetic and logical primitive func- 
tions (we ignore the problem of overflow and other exceptions). As an example, consider 
construction of the expression 2 X x + y. The compilation rules in table 3 gives us 

C\add(mul 2 x) y] [x = 2, y = 1] 2 = 
PUSHFUN add; PUSHFUN mul; PUSHINT 2; 
MKAP; PUSH 2; MKAP; PUSH 3; MKAP. 

If the variable x has been previously evaluated, it is safe to compute the value of 2 X x, 
and instead we can use the code sequence 

PUSHFUN add; PUSHBASIC 2; PUSH 1; GET; MUL; MKINT PUSH 3; MKAP. 

and if both x and y have been previously evaluated, we can use the code sequence 

PUSHBASIC 2; PUSH 0; GET; MUL; PUSH 1; GET; ADD; MKINT. 

When dealing with expressions with list values the situation is similar. For instance, 
consider construction of the expression tl /, as in the function definition 

/ / = if null I then • • • else g (tl I) 

Because of the test in the condition part of the if-expression, not only can we know for 
sure that / has been evaluated, in the else part of the if-expression we can also assert that 
1 is on cons form. To construct the expression tl /, instead of using 

PUSHFUN tl; PUSH 2; MKAP 

we can use the code sequence 

PUSH 1; TL. 

Not only does this avoid allocation of an apply node, it also removes the overhead of 
executing the code for the tl function when function g calls for evaluation of its argument. 

When a variable with a list value cannot be determined statically to be on cons-form, 
we can test for this dynamically, with instructions MKHD and MKTL, used in the following 
compilation rules. 

Clhdejrn = C[ e ] r n; MKHD 
Cltlejrn = C[ e ] r n; MKTL 

MKHD and MKTL test whether the top of stack is on cons-form, and if this is the case 
then behaves as the HD and TL instructions respectively, otherwise constructs the graphs. 
These instruction are defined by 
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(o, MKHD.c, n.s, v, G[n = CONS n x n 2 ], E, D) 
(o, c, ni.s, v, G[n = CONS n\ n 2 ], E, D) 

otherwise: 

(o, MKHD.c, n.s, v, G, E, D) =>■ 

(o, c, ni.s, v, G[rii = AP n 2 ra, n 2 = FUN /jrf], D) 

and similarly for MKTL. 

The analysis shown above can detect call-by-name to call-by-value transformations 
only locally within a function. A more general method would be to use a global analysis 
method, as described in [Myc80]. A future version of our compiler may include such an 
analysis phase. 

7 Implementation 

This section discusses some features of our compiler implementation of the G-machine 
concept. The source language is a completely function variant of ML [GMW79], with 
call-by-name semantics. The last phase of the compiler translates the G-machine code 
into target code for the VAX- 11 computer. 

7.1 Compiler organisation 

The compiler is organised into the following parts: 

Syntax analysis: Builds an abstract syntax tree of the program. 

Type checking: Checks that the program is well-typed, using a polymorphic type check- 
ing algorithm [Mil78]. 

Program transformation: Transforms the program into a set of functions, possibly 
mutually recursive, as described in section 5.1. Also, the user defined data types 
and pattern matching is transformed into simpler constructs. 

Value analysis: Performs the analysis on evaluated variables as discussed in section 6.2. 

G-code generation: Translates the functions into G-machine code. 

Target code generation: Translates the G-machine code into assembly code for the 
VAX- 11 computer. 

The entire compiler, except for the syntax analysis, has been written in fc [Aug82], 
a functional language with lazy evaluation, a forerunner to the present implementation 
based on our earlier ideas of compiled graph reduction [Joh81]. We are currently in the 
process of rewriting the compiler into its own language. 
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7.2 Target code generation 



For target code generation, the components of the G-machine state is mapped onto the 
target computer in the following way: 

O is printed on standard output. 

C is the target code of the currently executed function, and the program counter. 
S is a dat for the pointer stack, and a stack pointer register (called ep). 

V is the system stack and stack pointer (sp). 

G is a large heap area divided into two equally sized halves, and a register (called hp) 

as heap pointer, pointing to the next free location (see below). 
E is the target code for the functions, with code that performs nr-of-arguments-check. 
D is the system stack and stack pointer (sp). Only pointers into the S stack and into the 

system stack are pushed, not entire stacks and dumps as description of the abstract 

machine suggests. 

Both the V stack and the dump D is mapped onto the same stack in the target 
machine, which is possible because things pushed onto the V stack are only used locally 
in functions which pushed the value. 

The garbage collector is a variant of Fenichel-Yochelson's copying garbage collector 
[FY69], but for vary-sized cells, and works as follows. The heap is divided into two equally 
sized areas. Memory is allocated from one area at a time by simply incrementing the heap 
pointer hp, and when running out of memory in one area the entire graph is copied into 
the other heap area, leaving the garbage behind, also updating the pointers on the pointer 
stack S. In the target code, before an instruction sequence that allocates a certain amount 
of memory, a check is made if that amount of memory is available on the heap, if not the 
garbage collector is invoked. A disadvantage of this method of memory management is 
that only half of the total available memory can be utilised; however on computers with 
large virtual address spaces this is not a serious problem. To its advantage, the time used 
for garbage collection is proportional to the size of the graph, (not the size of the heap 
area, as it is for mark-scan methods) thus taking little time for small graphs. 

The target code generation is done by deferring some operations on the pointer stack 
S and basic value stack V , and instead simulate the contents of the topmost elements. 
Thus instructions PUSHINT, PUSHFUN, PUSHBASIC, etc, which pushes constants, will 
in the code generator push these constants on the simulated stacks. The instruction 
MKAP, for instance, will thus take two arguments from the simulated stack if nonempty, 
otherwise from the real stack. To bring out the main idea, a simple example of target 
code generation is shown in figure 9, which constructs the graph for the expression 3./ 5. 
In the simulated stack fun f refers to a pointer to a function node /, int i refers to an 
integer node with value i, and heap n refers to a pointers into the heap at location n. 
In the code, newly created nodes on the heap are referred to relative to the hp register, 
and since node allocation changes the value of hp, we also need to carry along a current 
relative value of hp, called HP. Function nodes, integer nodes, boolean nodes and the nil 
node are not allocated each time on the heap; instead pointers to nodes in a constant area 
are used. (The simulated V stack is irrelevant for this example and is not shown.) 

A further possibility which is not shown in this example is to allocate machine registers 
for entries into the simulated stacks, particularly the for V stack entries for the result of 
the usual arithmetic operations. 
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G-code 


VAX assembler code 


HP 


Simulated S stack 


Remark 






0 


() 


Start configuration 


PUSHINT 3 




0 


int 3.0 


Push pointer to integer constant 3 


PUSHFUN f 




0 


fun f.int 3.() 


Push pointer to function node for f 


PUSHINT 5 




0 


int 5. fun f.int 3.0 


Push pointer to integer constant 5 


MKAP 


movl $APPLY,(hp)+ 


4 




Tag of apply node to heap ... 




movl $C_F, (hp)+ 


8 




Fun. part = fun f to heap ... 




movl $I_5, (hp)+ 


12 


heap O.int 3.0 


Arg. part = int 5 to heap. 


CONS 


movl $C0NS, (hp)+ 


16 




Tag of cons node to heap ... 




movl $I_3, (hp)+ 


20 




Head part = int 3 to heap ... 




moval -20(hp) , (hp)+ 


24 


heap 12.() 


Tail part = result of MKAP to heap 




moval -12(hp) ,-(ep) 


24 


0 


Move result to real S stack. 



Figure 9: Target code generation from graph construction code. 

The target code is assembled in the usual manner, and loaded together with the 
runtime system to make an executable file. The runtime system contains code for PRINT, 
EVAL, unwind, the garbage collector, and also target code for the primitive predefined 
functions add, sub etc. 

7.3 Performance 

We have compared our implementation with a couple of other implementations of func- 
tional languages that have been available to us, both with strict and lazy evaluation. The 
implementations in the table below are the following: 

1. Our implementation; lazy evaluation, executes VAX- 11 code. 

2. Cardelli's ML system [Car84]; strict evaluation, executes VAX- 11 code. 

3. The Liszt Lisp compiler under UNIX; strict evaluation, executes VAX- 11 code. 

4. The ML implementation in the LCF system; strict evaluation, interprets Lisp. 

5. SASL, based on the SECD machine [Tur75]; lazy evaluation, interpretative. 

6. C compiler under UNIX (applies only to the Fibonacci program). 

The table below shows the execution time in seconds for three programs: fib(20) using 
fib(n) = if n < 2 then 1 else fib (n — 1) + fib(n — 2), primes up to 300 using sieve of 
Erathostenes, and insertion sort of 100 random elements. 





1. 


2. 


3. 


4. 


5. 


6. 


Fibonacci 


0.92 


0.5 


1.1 


46 


31 


0.46 


Primes 


0.50 


1.2 


1.1 


29 


20 




Insert sort 


0.37 


1.0 


0.8 


15 


12 





The programs above have been chosen so that the results are the same independent of 
whether lazy or strict evaluation is used, but in general lazy evaluation permits a more 
direct programming style. It should be noted that in our Fibonacci program, in the 
recursive call to fib the arguments are passed by value, due to the analysis on evaluated 
variables described in section 6.2. 
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8 Related work 



Jones and Muchnick [JM82] gives an alternative evaluation mechanism for combinator 
expressions, with a compilation algorithm which translates combinators to fixed-program 
code for a stack machine. 

Hudak's combinator based compiler [HK84] resembles our work in many respects. 
He uses the standard combinators as a convenient intermediate language for performing 
program transformations and optimisations. The program is the converted into one con- 
taining macro-combinators, which is similar to Hughes' super-combinators [Hug82] and 
our global function definitions. Each macro-combinator is then translated into code for a 
conventional machine. 

Dick Kieburtz et. al at Oregon Graduate Center is currently in the process of designing 
and implementing a VLSI chip for the G-machine. 
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